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ERRATA  FOR  IMM-NYU  211 


p.  5,   line  11.     For  "  J"  a,  ,''*   read  "  T~   a,  " 


For  "  £3  a^  ,''*   read  "  J" 
X  I 


p.  11,  line  10.     For  "it  cannot  be  singular,"   read  "then  I  *■  H 

cannot  be  singular"* 

p.  13,  last  line  before  footnote.   For  "vector,"  read  "vectors". 

p.  11+,  formula  (2,7).   In  first  line,  delete  numeral  1  below  first 

plus  sign  in  third  member  of  the  equation. 
In  Second  line,  delete  comma  and  "N  =  1,2,.,." 

p.  26,  line  2.      Nximber  this  formula  as  (3.1^.)  * 

p.  27,  formula  (3«5)«   Close  parentheses  after  | |h| |  but  before 

exponent  2  in  denominator  * 

p.  27,  line  13 i     Insert  "P  =  [|h^.M"   before  "therefore". 

p.  28,  line  1  .     Delete  "P  =  [|h^.|]". 

11      n       St      t) 

p.  28,  formula  (3.7) •   For  ||p||,   read  ||p||  in  last  two  members 

of  the  inequality. 

p.  35 »  line  13 .     For  "number",   read  "numbers". 

p.  35,  line  5  from  bottom.   Delete  "of". 

p.  38,  formula  (1;.6).   For  "2llo|l^",   read  «2||p||^". 

p.  38,  line  2  and  line  3  from  bottom.   Delete  "with  high  probability 

(specifically,  a  probability  of  95°^o)," 

p.  I4.I,  line  10.  Insert  "i  =  "  before  "i^"  . 

p.  l|l,  line  11.  For  "1-th",   read  "i^-th". 

p.  Ul,  line  12.  For  "i-th",   read  "i^-th". 

p.  I4.I,  line  17.  For  element",  read  "component". 
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p.  [4.2,  line  10  ff.   Delete  paragraph  beginning  "Now  the  key..." 
p.  I|.2,  line  15.     Insert  "from  formulas  (5*1) ,  (5*2),  and 

(5.3)"  before  "that", 
p.  kk,   Note  (c).    Change  "ic""'-^"   to  "lO''-^"   in  second  line, 
p.  [^.8,  "formula  {6.14.).   Insert  "^  "  after  "H  "  . 

p.  Skt    line  3.      In  third  member  of  inequality,  change 

N 
»jjN-l„   ^^  u^s   -1„  ^ 
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A   THEORETICAL    C0hPARI30Ij    OF    THE   EFFICIENCIES 
OF   TWO    CLASSICAL   rfiiThCDj    AMD  A  KONTE    CARLO   MjiTKOD 
FOR    COIIPUTIEG   ONE    COMPONENT   OF    THE   SOLUTION 
OF   A   SET    OF    LINioAR   ALGEBRAIC   E.^UATIQNS 

J.    H.    Curtiss 

Summary «   In  this  paper  a  basis  is  established  for  a 
theoretical  comparison  of  the  amount  of  work  required  by 
three  widely  different  methods  of  calculating  (to  a  given 
accuracy)  one  component  of  the  solution  of  a  set  of  simul- 
taneous linear  algebraic  equations.   The  equations  are 
assumed  to  be  in  the  form  f  =  E^:   +   y>  where  y  is  a  given 
n-dimensional  vector  and  H  is  a  given  nz  n   real  matrix. 
The  amount  of  vrork  is  measured  by  the  number  of  multipli- 
cations required  under  the  most  unfavorable  conditions. 
The  three  methods  are  (a)  the  Gauss  elimination  m.ethod, 
(b)  the  particular  stationary  linear  iterative  method  de- 
fined by  the  recursion  formula  <^,t^-|  =  H^-^-  "*"  Yj  N  =  0,1,2,,,., 
and  (c)  a  Iionte  Carlo  method  vhich  consists  essentially  of 
a  statistical  process  for  estimating  the   iterates  of  H. 
The  amount  of  work  required  by  the  first  method  is  pro- 
portional  to  n  ,  v;here  n  is  the  order  of  the   matrix  H. 
The  amount  of  viork  required  by  the  second  metycr"  to  achieve 

a  predetermined  accuracy  is  given  by  an  expresi'ion  of  the 

2 

form  kn      +  n,    v;here    k   is   ordinarily   fairly   large.      The    amount 
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of  work  required  by  the  lionte  Carlo  method,  is  given  by  an 

2 

expression  of  the  form  n   +  n  +  b,  v/here  b  is  ordinarily 

a  very  large  niimber.   If  no  preliminary  preparations  aimed 
at  reducing  b  are  made,  then  the  amount  of  work  for  the 
Monte  Carlo  method  is  given  by  an  expression  of  the  form 
n  +  b. 

The  result  of  this  varying  dependence  on  the  dimen- 
sionality of  the  problem  is  that  the  Honte  Carlo  method 
is  the  ere  t  ically  more  efficient  tlir.n  the  other  two  methods 
for  sufficiently  laJ^;:",Q  values  of  n.   The  value  of  n  at 
which  one  method  beco}i.es  more  efficient  than  the  other 
depends  on  the  accuracy  w:i.th  which  the  solution  is  to  be 
computed . 

Upper  bounds,  ^^Jilich  are  actually  attained  in  special 
cases,  are  derived  in  this  paper  for  the  ataount  of  work 
required  by  the  iterative  and  the  stochastic  meti  ods. 
From  these,  break-even  points  on  the  range  of  the  dim.en- 
sionality  n  are  calculated  which  serve  at  least  as  indi- 
cations of  the  intervals  of  values  of  n  which  are  favorable 
for  each  of  the  three  methods.   A  table  in  Section  5  gives 
the  favorable  numerical  intervals  of  n  for  var.tcus  typical 
specifications  of  the  problem. 

A  feature  cf  the  presentation  is  the  development  of 
a  new  minimum-variance  arrangement  of  the  Monte  Carlo 
method  for  solvin;'  linear  equations,  which  exploits  in  a 


simple  way  an  initial  estimate  of  the  solution  to  reduce 
variance.  Tae    construction  of  the  Monte  Carlo  method  will 
be  found  in  Section  3.   In  Section  6,  this  Monte  Carlo 
arrangement  is  adapted  to  the  problem  of.  inverting  a  matrix, 
Section  6  also  contains  derivations  and  comparisons  of  cer- 
tain [general  linear  and  polynomial  iterative  methods  for 
matrix  inversion. 


k^ 


1.  Introduction.   Many  of  the  problems  of  numerical  analy- 
sis to  which  lionte  Carlo  methods  have  been  applied  belong 
to  the  following  general  type:   A  rule  is  given  whereby 
each  one  of  a'  set  of  real  or  complex  numbers  a.,,  a-,... 
can  be  computed.   It  is  required  to  compute  the  sum  of 
the  series  a-,  +  ap  +  . .  .  .   (The  series  may  be  finite  or 
infinite .  ) 

The  standard  method  of  stochastic  estimation  ("Monte 
Carlo"  method)  for  this  type  of  problem  consists  iii  selec- 
ting numbers  z,  and  p,  ,  k  =  1,2,  ,.  .,  such  that  z,  p,  =  a,  , 

p,  >  0,  Z  p,  =1.   Then  a  random  variable  J  is  set  up  with 
k:  -    ^  k 

the  probability  distribution  given  by  Pr(J  =  k)  =  p  , 

k  =  1,2,  ...  •   The  random  variable  z-.   will  obviously  have 

a  theoretical  mean  value  equal  to  the  sum  of  the  series 

a-,  +  ap  +  ...  .   The  statistic 

z  =  -A 2. ^ 

where  J.,,  J„,  ...  are  independent  replicas  of  J,  furnishes 
an  estimator  of  the  mean  value  of  Z-^.  (that  i,:;.  jf  the  solu- 

o 

tion  of  the  problem)  which  has  various  weil--knovjn  optim.um 
statistical  properties. 

It  is  hard  for  some  classically  train'^d  numerical 
analysts  to  see  hov;  Monte  Carlo  methods  can  ever  be  advan- 
tageous in  such  a  problem.   A  sor.iewhat  ovei'-simplif ied 
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version  of  their  reasoning  might  go  as  follows.   Let   s 

s 
be  chosen  so  that  2a,  gives  a  satisfactory  approximation 

to  the  sum  of  the  scries.   (If  the  series  is  finite,  let 

s  be  the  number  of  terms.)   Since  each  observed  value  of 

z  ,|.  conditionally  estimates  just  one  of  the  numbers  a,  , 

there  will  have  to  be  at  least  as  many  terms  in  the  sum- 

_        s 
mation  for  Z   as  in  Za,  «   Indeed,  because  of  statistical 
V       1  •'^ 

fluctuations  it  will  probably  be  necessary  to  make  very 

many  more  observations  on  z ,  than  there  are  terms  in  the 

s  _ 

finite  approximation  Sa,  ,  and  even  then,  Z  will  probably 

s 
not  be  as  good  as  2a,  .   This  means  that  the  comit  of 

X  ^ 

additions  alone  v/lll  be  very  much  greater  for  the  stochastic 
approximation  than  for  the  straightforward  method  of  solu- 
tion; and  then  too,  there  is  the  trouble  of  setting  up  the 
z,  's  and  Pi^ '  s  in  advance,  and  of  determining  stochastically, 
over  and  over  again,  the  value  of  J  to  use  in  the  obser- 
vations. 

The  case  against  this  argument  can  very  conveniently 
be  stated  in  terms  of  a  specific  problem  which  is  funda- 
mental to  this  paper.   It  is  the  problem  of  calculating 

N+1 
the  i-th  component  of  the  vector  K    9,  where  K  denotes 

the  2n  X  2n  matrix  [k.  .]  and  9  is  the  2n-dimensional  vector 

(t,,...,t„  ).   The  i-th  element  of  K    0,  which  we  write 
1    '  2n  ' 

as  CI   9].,  is   the  sura  of  (2n)^    term.s  of  the  type 

k..  k.  .   ...  k.  .    t.     •   The  stochastic  estim.ation" 
11^  ^1^2      ^irN+1  ^N+1 

"See  Curtiss  [3]»  (The  square  brackets  refer  to  the 
references  at  the  end  of  the  paper.) 


6. 


is  accomplished  by  selecting  numbers  z.  .  and  p.  ., 

i,j  =  l,...,2n,  such  that  z^jP^j  ~  ^ii'  "^^^  Pij  =  ^' 
2p.  .  =  1  for  all  i.   Then  a  family  of  random  variables 

J  ,J  ,J,,...,J„  ,   is  set  up  in  such  a  way  that  it  repre- 
sents  a  Markov  process  (or  "random  x^alk")  with  states 
(resting  places)  designated  by  l,2,...,2n  and  with  station- 
ary transition  probabilities.   The  specification  of  the 
joint  probability  distribution  is  accomplished  by  assigning 
an  arbitrary  distribution  to  J^  (but  with  Pr(J^=i)  /   0), 
and  thereafter  using  the  equations"  Pr(  J^^-j^=  j  |  Jj^=i  )  =  p^., 
i, j  =  l,...,2n.   Finally,  a  random  variable 


'^o"^l**"^N+l    '^o'^l  '^1*^2'**  '^N'^N+l  *^N  +1 

is  set  up.   It  is  now  easily  seen  from  the  definition  of 
mean  value  in  probability  theory,  that 

E(z1j  =i)  =  K?^^"^9]..   We  use  (Z^+. .  .+Z^  )/v   as  the  esti- 
mator of  K   9].,  where  Z,,...,Z   denote  v  independent 
determinations  of  Z. 

The  various  possible  values  of  k..  k k.  k.    t. 

correspond  to  the  values  of  a,  in  the  previous  more  general 


"By  Pr(z|b)  v;e  mean  the  conditional  probability  of  the 
event  a,  given  that  the  event  b  has  occurred. 

By  E(xia),  we  mean  the  mean  value  of  the  conditional 
distribution  of  the  random  variable  X,  given  that  the 
event  a  has  occurred. 


formulation.   Let  us  think  of  these  possible  values  as 

renumbered  in  a  linear  and  serial  order,  using  a  sxngle 

index  k.   There  will  be  s  =  (2n)"'    suci-  numbers.   (They 

may  not  be  all  distinct.)   Then  the  (2n)^    correspondingly 

renumbered  products   P.- .•  •••?.;  a  play  the  role  of 

^^1    ^N  N+1 

p, ,Pp,...,p   in  the  previous  formulation,  and  the  various 

possible  values  of  the  vector  random  variable 

J  =  (J  ,  J,  ,  . .  . ,  J.,^-,  )  correspond  to  the  values  of  J  in  the 
=     o   1      N+1 

previous  formulation. 

It  now  begins  to  be  evident  that  our  formulation  of 
the  general  summation  problem  at  the  begin  ing  of  the 
section  was  deceptively  over-simplified.   In  a  multi- 
dimensional summation  problem,  the  following  two  factors 
may  come  into  play  on  the  side  of  a  Monte  Carlo  method  of 
solution: 

(a)  If  the  calculation  of  each  a,  is  a  complicated 

one,  it  may  be  possible  to  arrange  things  so  that  the 

calculation  of  each  observation  on  z,  is  very  much  simpler 

than  the  calculation  of  the  corresponding  term  a,.   This 

will  m  particular  be  the  case  if  the  calculation  of  each 

a,  involves  the  formation  of  a  continued  product,  because 
k 

then  part  of  the  worl:   of  calculating  the  product  can  be 
sidestepped  in  the  stochastic  process  by  using  the  multi- 
plicative law  of  probabilities. 


(b)  Some  of  the  numbers  a.  In  the  finite  approxima- 
tion to  Za,  may  be  very  unimportant  and  need  not  be 
represented  in  the  statistical  estimator  at  all.   The 
stochastic  estimation  process,  if  properly  set  up,  will 
automatically  take  care  of  this  by  making  the  appearance 
of  the  representative  of  a  non-essential  term  a  very  rare 
event . 

We  add  here,  more  or  less  as  an  aside,  the  remark 
that  ^^rhen  the  calculation  of  each  particular  z   is  much 
simpler  than  that  of  the  corresponding  aj,  then  the 
problem  of  the  accumulation  of  round-off  errors  may  not 
be  nearly  as  serious  for  the  stochastic  method  as  it  is 
in  the  direct  computation. 

All  of  these  factors  favoring  the  Monte  Carlo  method 
are  pi'esent  in  the  case  of  the  problem  R    9]  .  and  in  the 
related  problem  of  solving  systems  of  linear  algebraic 
equations.   The  factor  (a)  is  usually  particularly  in 
evidence  in  numerical  problems  of  stochastic  origin,  and 
indeed  it  is  in  such  problems  that  the  Monte  Carlo  Method 
has  had  its  chief  successes.   Matrix  problem.s  can  always 
be  throvm  into  a  form  in  wbich  they  are  nuinerically  equi- 
valent to  pi'oblems  with  a  stochastic  pedigree.   We  shall 
exploit  that  fact  in  the  present  paper  to  obtain  a  favor- 
able environment  for  the  com.parisons  to  be  made. 

But  we  cannot  conclude  this  introduction  without 
making  a  remark  which  is  on  the  negative  side  as  far  as 
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Monte  Carlo  methods  are  concerned.   It  certainly  would  seem 
that  v/r.enever  Monte  Carlo  methods  appear  to  advantage  in 
sumraation  problems,  factor  (b)  ai^ove  must  be  playing  an 
important  role,  becau3C  otherwise  the  criticism  regarding 
the  necessary  momber  of  addition  operations  required  for 
any  reasonable  degree  of  accuracy  in  the  Monte  Carlo 
approach  V70uld  be  valid.   But  if  that  is  so,  i^rhy  cannot 
a  detei^ministic  method  be  devised  which  will  ignore  the 
unimportant  terms  and  be  even  more  efficient?   The  author 
suspects  that  whr.t  we  now  need  is  a  more  i.aghly  developed 
deterministic  theor'y  of  quadrature  and  of  linear  computation 
in  many  dimensions.   Vi'hen  this  becomes  availaole,  the  Monte 
Carlo  method  may,  at  least  for  miatrj.x  problems,  lose  the 
very  modest  advantages  wi^ich  will  be  claimed  for  it  in 
this  paper. 

2«  The  nuinerical  problem  and  its  non-stochastic  solution. 
Throughout  tne  paper  we  shall  cor;ti^.^ue  to  denote  matrices 
by  capital  letters  ana  their  elements  by  the  corresponding 
lower  case  letters;  thus,  for  example,  H  =  [h.  .].   We 
represent  vectors  by  lovjer  case  Greek  letters  and  their 
components  by  the  corresponding  Roman  letters;  thus  for 
example,  cf  =  (x-,  ,  Xp,  . . . ,  x  ).  Vie    shall  also  find  it  con- 
venient occasionally  to  designate  the  elements  of  a  matrix 

by  double  subscripts  affixed  to  the  symbol  for  the  matrix 

-1  2 
(thus,  H.  .  or  [(I-H)   H  ].  .).   Furthermore  we  shall  frequently 
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designate  the  components  of  a  vector  by  a  similar  subscript 

notation  (thus  ^  =  (c,]  ,  ^]  ,  ...,  ^]  ),   All  vectors  will 

be    real    and  n-dimensional    and   all   matrices   will   have    only 

real   elements. 

By   II  H  II     we    shall   mean  max   2  |h.  .|,    and  by    H'?;  II  ,    we 

1      j      iJ 

shall  mean  maxfx.  j.      It    is   obvious   that    ||  Et.\\  ^    |i  H  ||     ||^  || 

and  that   ||a+b||^    ||  a||  +  ||b||   .      It    is      well-knovjn""'  that 

11  AbII  ^    II  All      I!  bII   t    therefore,    by   induction,     ||  H^  ||    ^    I|h  ||^    • 

Tlie    numerica]    problem  -,'ith   which   we    shall    be   mainly 
concerned   is    tbat    of    solving   the    linear    system 

(2.1)  AE     =    ^i, 

for    %.  ,    whei'-e    A   is    a   given   non-singular   n  x    n     matrix   and 
is    a  given   vector,      VJe    assume    that    this    system  has    been 
thro'-.^ni   into    the    form 

(2.2)  -i,  =   Hf,     +   Y    , 

where   H   =    [h.  _.]    is    an     nxn     mc;trix   vith    the   property   that 


{2,3)  i!  H   !!    =  max   S  Ih       j  ^    1    . 


'See    for   exair.ple    Couraiit    ana   Hllbert    [IJ*    P«    16,    footnote. 
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It  is  beyond  the  scope  of  this  paper  to  give  an 
extended  discussion  of  the  nethods  of  passing  from  (2,1) 
to  (2.2),  but  a  few  general  remarks  on  the  subject  are  in 
order  at  this  point.   In  the  first  place,  it  is  always 
theoretically  possible  to  transform  the  problera  repre- 
sented by  (2.1)  in  this  manner.   Indeed,  let  H  be  any 
matrix  whatsoever  with  the  property  (2.3 )•   (For  instance, 
let  H  =  dl,  where  I  is  the  unit  matrix  and  d  is  a  scalar 
lying  between  0  and  1.)   It  is  known  that  if  H  satisfies 
(2.3),  it  cannot  be  singular".   Let  I'i  be  defined  by  the 
equation  I  -  i'4A  =  H.   Tl'iis  says  that  M  =  (I-H)a"  ,   There- 
fore M  cannot  be  singular.   The  system 

?  =  HI  +  MNi.  =  (I-m)C  +  Mn\.  , 

is    just   the    same    as   the    system 

I4AC  =  Mn    , 

and  since  II  is  non-singular,  this  system  is  precisely 
equivalent  (2.1). 

But  in  practice  it  is  not  feasible  to  set  up  an 
arbitrary  II  satisfying  (2.3)>  s.nd  then  to  determine  I-I  as 

"0.  Taussky-Todd,  [9]. 
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above.   The  reason  is  thr'it  the  formula  M  =  (I-H)A~   pre- 
supposes a  knowledge  of  A   ,  and  in  the  presence  of  this 
the  original  problera  becori:es  trivial.   Thus  it  is  natural 
to  think  of  M  as  being  chosen  first,  the  choice  being  made 
in  such  a  way  that  I  -  MA  =  H  has  suitable  properties. 
There  are  a  number  of  different  procedures  in  the  literature 
of  linear  computation  for  arriving  at  an  appropriate  choice 
of  M,   For  ezaiTiple,  if  A  has  dominant  diagonal  -  that  is, 
if  |a..|  >  2  |a.  .1,  i  =  l,...,n,  -  then  M  can  be  chosen 
as  the  inverse  of  the  principal  diagonal  matrix  whose 
principal  diagonal  is  that  of  A.   This  will  obviously  insure 
that  II  H|1  <  l'"'. 

There  are  nuraerous  non-stochastic  methods  of  solving 
(2.1)  '»  VJe  shall  here  restrict  our  considerstions  mainly 
to  a  class  of  metl  ods  known  as  linear  iterative  processes. 
An  effective  Monte  Carlo  method  for  the  problem  (2.1)  can 
be  based  on  this  type  of  process,  as  we  presently  shall 
see  • 

The  general  stationary  linear  iterative  process  for 
solving  (2.1)  is  ari\ived  at  by  throwing  (2.1)  into  an 
equivalent  fon.i  which  has  the  appearance  of  (2.2),  but  with 
H  restricted  only  by  the  requirement  that  its  eigenvalues 
all  lie  in  the  unit  circle.   An  initial  estimate  £   of  the 


Further  discussion  will  be  found  in  any  good  treatise  on 
numerical  analysis  which  deals  with  Iterative  methods  of 
solving  linear  equatioxis.   See  for  example  Householder  [7] 
and  Kilne  [8].   See  Porsythe  [6]  for  further  references. 

See  previous  footnote  for  references. 
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solution  is  mads,  and  successive  approximations  £,^,  E,   , 
are  then  define:!  by 

(2.[^)  Cjj+i  =  H^^^  +  Y  ,   N  =  0,1,2,...    . 


If  f'   denotes  the  solution  of  the  equations  (2.2),  then 
clearly,  since  ^^  =  E^^  +  r, 


(2.5)  £^  -  Cn  =  H(?^  -  Ch-i) 

=  ^^(^co  -^N-1^ 
=  ^''(?oo  -  ^o^ 


Thus  the  condition  for  convergence  for  any  starting  vector 
?  is  that   lim  H  =  0.   The  well-known  necessary  and 

sufficient  condition  [?]  for  lim  H  =  0  is  that  all  the 

N^oo 

eigenvalues  of  H  should  lie  in  the  unit  circle,  vdnich 
explains  Ihe  requirement  on  H  imposed  earlier  in  this 
paragraph. 

But  in  the  present  discussion,  ^^re  go  one  step  beyond 
this  requirement  and  insist  that  ||h|I<  1.   Since 
||H  II  g  II  H  II  \    this  condition  will  certainly  insure  that 
H^^  -.  0. 

For  purposes  of  error  analysis   it  is  advantageous 
now  to  introduce  the  residual  vector 


'By  error  in  this  paper,  we  shall  always  mean  truncation 
error  or  statistical  error  or  both  at  once.   There  will  be 
no  study  of  round-off  error,  nor  of  the  effect  of  miscel- 
laneous arithmetical  mistal:es. 


^.-f^- 


rn 


■-■.^s:-iiiy^  f^v!T 


fi. 


n^ 


Ik* 


(2.6)       Pn  =  Y  -  (I-H)^N 


.  =  H^j^  +  Y  -  ?i^  =  '-N+1  "  "^'N'   N  "  0.1,2,.. 


The  vectors     p^  are  of  course  always  computable  at  any  stage 
in  the  iterative  solution.   If  T,,.  -^>   '?_  =  (I-H)  y,    then 

In        CO 

obviously  p,,  -^  0.   The  converse  is  also  true,  because 
(I-H)"  p^   =  $   -  n^T*   Thus  pjj,  or  ||  pj^||  ,  are  logical  measures 
of   the  error  in  the  N-th  approximation  to  the  solution.   It 
is  to  be  noted  from  (2.. '4.)  and  (2.6)  that 


(2.7)  Pn  =  ^^+1  -  S  =  ^«'n  t  ^^  '   ^«?N-i  ■"   Y) 

^  ^^'n  -  ni-1^  =  ^Pn-1  '    N  =  1,2,*.. 

-   •  •  •    -  ^^  Po   • 

Prom  (2.6)  and  (2.?)  it  follows  that  the  successive 
approximations  -'^  generated  by  (2.1+)  can  theoretically  also 
be  generated  in  the  following  manner:  Select  'h      as  before,  and 
compute  p   from  the  definition  of  residual  vector, 
p   =  Y  -  (I-H)H:,  •   Then  conduct  the  iterations  by  means  of  the 
pair  of  formulas 

(2.8)  ^-N+l  =  ^^N  ^  Pn  ^   N  =  0,1,..,   , 

(2.9)  f'j^^^L  =  Hp^   ,   N  =  0,1,... 
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We  note  that  by  back  substitution,  we  find  that 


^2-10)  ^m  =  ?o^Po^Pl^----*-PN"^o-'(I^K-'---^^^'fio  • 


In    actual  practice,  the  customary  running  check  on  the 
accuracy  of  the  solution  consists  in  computing  ||  p  ||from  time 
to  time  to  see  hovj   small  it  is.   The  iterations  are  stopped 
when  II  a- 1 1  reaches  a  nredetermined  order  of  smallness.   If 
(2.L|.)  xvere  used  for  the  iterations,  the  computation  of  p   would 
be  done  by  using  the  formula  a^   =  Crj^-,    -   ^jt»   If  (2.8)  and 
(2.9)  were  to  be  used,  it  would  be  advisable  to  compute  test 
values  of  ||  p„||  from  the  definition  of  p„  (that  is,  from  the 
formula  p,,  -  y  ~  (I-H)Cm)>  rather  tha:i  to  accept  the  values  of 
II  pKrllgiv®^  by  (2.9).   We  shall  com.e  back  to  this  point  in  a 
moment . 

An  a  priori  truncation  error  analysis,  made  with  the 
purpose  of  estimating  the  number  of  iterations  which  will  be 
required  to  achieve  a  given  accuracy,  can  be  conducted  either 
in  terms  of  p   or  in  terms  of  the  error  vector  L       -    E^,      If 
the  size  of  ||  d .  ||  is  to  be  the  criterion,  then  we  use  (2.?) 
to  obtain  the  very  simple  ectimate 

(2.11)  llpjjl  ^  llHll^llpJI  . 

But  if  the  deviation  of  £  ,  from  ^    seems  to  be  a  more  appro- 
priate  or  convenient  measure  of  the  truncation  error,  then  we 


16. 


use  the  fact  that  (I-H)~  p   =  ^_  -  ^      •   Substituting  this 


into  [Z.SK    we  find  that 


(2.12)       ?co  -  ^N  =  hN(I-H)-\ 


Since  (I-H)"-'"  =  I  +  H  +  H^  +  .  .  . ,  it  follows  that 

||(I-H)"^||     g    11x11+    I1h1I+    |1h||2    +    ...    =    (1-||h||    )"^.      Therefore 

(2.13)  H^    -,,11    ^^^^[^    II  Poll    . 

It  is  perhaps  worthwhile  to  point  out  here,  by  way  of 
an  aside,  that  the  inequalities  (2.11)  and  (2.13)  hold  for 
any  one  of  the  various  matrix  and  vector  norms  in  common  use", 
and  not  just  for  the  norm  1|h1|  =  max  z|h.  .|»   We  are  using  this 

•      •    -LI 

1   J   ■^ 
particular  norm  here  because  of  a  special  application  it  has 

to  the  Monte  Carlo  method,  which  will  be  brought  out  in  the 
next  section. 

The  iterative  formulas  (2.8)  and  (2.9)  are  (so  to  speak) 
homogeneous,  and  therefore  look  easier  to  use  than  (2.L|.).   But 
(2.8)  and  (2.9)  have  the  great  disadvantage  of  being  not  self- 
correcting  in  case  a  mistake  is  made  at  one  stage,  whereas 
(2.[|.)  does  have  this  property.   Suppose  for  example  that  for 


"See  Householder  [7]j  PP  •  38-^l-« 
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some  N,  Pv]-+-i  is  riistakenly  computed  as  a  zero  vector  in  using 
(2.9).   Then  since  all  subsequent  vectors  p^  will  equal  zero, 
it  is  obvious   from  (2.10)  that  £,„   will  be  irretreviably  wrong. 
But  if  a  mistake  is  made  in  com.puting  -E,-^,-,    by  (2.[|.),  the  subse- 
quent effect  is  like  starting  over  a^^ain  with  another  E,    • 

Therefore  we  are  not  proposing  (2.8)  and  (2.9)  as  a 
practical  substitute  for  (2.[|.)  for  a  ncn-stochastic  numerical 
solution  of  the  problem  ^  =  H^  +  y.   Our  real  purpose  in  intro- 
ducing the  alternative  iteration  formulas  was  to  develop  the 
representation  of  ^„  given  by  formula  (2.10).   This  representa- 
tion seems  to  be  an  advanta'^eous  one  upon  which  to  construct  a 
Monte  Carlo  solution,  as  we  shall  see  in  the  next  section. 

It  is  not  without  interest,  however,  to  point  out  that 
if  the  amount  of  work  involved  in  a  computing  job  is  measured 
in  any  sensible  way,  it  theoretically  requires  no  more  work  to 
use  (2.8)  and  (2.9)  up  to  a  predetermined  value  of  N  than  to 
use  (2.I1).   This  assumes  that  check  values  of  p  will  not  have 
to  be  computed  from  the  definition  of  residual  vector  from 
time  to  time  in  the  use  of  (2.8)  and  (2.9).   In  this  paper,  we 
shall  measure  amount  of  work  by  merely  counting  up  the  number 
of  multiplications  required  under  the  worst  conditions,  assum- 
ing no  zero  or  unit  elements.   Additions  and  subtractions  will 
be  ignored.   A  division  or  reciprocation  will  count  as  a  single 

multiplication.   To  arrive  at  ^^   by  (2.i|),  starting  with  4  , 

2  2 

requires  n  multiplications  per  iteration,  or  Nn   in  all.   To 

arrive 


15. 


at  ;p^  by  (2.8)  and  (2.9)  without  computing  any  intermediate  or 

2 
final  check  values  of  p,,  takes  n  multiplications  for  p  ,  and 

2 
thereafter  (N-l)n  multiplications  for  each  iterative  determin- 

2 
ation   of  p„.   3o  once  again  the  count  is  Nn  « 

Actually,  in  later  sections  we  are  going  to  set  up  an 

error  analysis  which  implies  that  p   will  always  have  to  be 

computed.   The  use  of  (2.4.)  ^^fill  be  tacitly  assumed,  so  n 

extra  multiplications  will  have  to  be  added  to  the  total  count. 

3*  Stochastic  solution  of  the  problems.   It  should  be  apparent 
from  the  foregoing  section  that  even  if  one  restricts  oneself 
to  the  class  of  stationary  linear  iterative  processes,  there 
is  literally  an  uncountably  infinite  number  of  methods  for 
solving  the  problem  A  ^  =  V( ,  corresponding  to  the  different 
possible  choices  of  H  or  II.   An  even  more  disconcerting  fact 
than  this  was  implicit  in  the  discussion  in  the  foregoing 
section.   It  can  be  expressed  in  the  form  of  a  theorem:  "Given 
any  one  way  of  solving  A^  =  >7y.,  there  is  always  a  much  better 
way."   For  given  any  one  H,  a  happier  choice  of  H  from  the 
standpoint  of  the  error  analyses  given  by  (2.11)  and  (2.12) 
always  exists. 

Under  such  circumstances  it  is  evident  that  some  strict 
ground  rules  are  required  if  meaningful  comparisons  are  to  be 
made  between  methods.  This  statement  applies  even  to  compari- 
sons within  very  special  classes  of  classical  methods,  and  it 
is  especially  relevant  when  an  utterly  unorthodox  method  such 
as  the  Monte  .Carlo  Method  is  to  be  brought  into  the  picture. 
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We  propose  then  to  adhere  to  the  following  set  of  rules: 

(1)  It  will  be  assumed  that  the  problem  is  given  in  the 
form  C  =  H^  +  Y,  with  ||h||  <  1. 

(2)  The  primary  comparison  will  be  between  a  Monte  Carlo 
Method  for  solving  ^  =  Hl^  +  Y  SLnd   a  stationary  linear  iterative 
method  of  the  type  described  in  the  preceding  section.   Both 
methods  will  be  based  on  the  particular  H  given  in  (1).   The 
linear  iterative  method  will  be  defined  by  the  recursion 
relations  Cvt^t  =  H^rj  "*"  Y*  N  =  0,1,...,  or  alternatively,  by 
the  formula 

(3.1)        ^'N  "  ^o  ""  (I+H+...+H^-l)p^  , 

where  -T   is  the  initial  estimate  and  p   is  the  initial  residuaJ. 
o  o 

vector  (see  equation  2.10).   The  Monte  Carlo  Method  will  consist 
in  effec.t  of  a  statistical  estimation  of 


(3-2)     -   =  ?  +  (I  ^  H  +  h2  +  ...)  p   , 


using  the  same  H,  same  ^   ,  and  same  p   as  in  (3.I). 

(3)  No  speculation  will  be  permitted  as  to  the  existence 
of  a  better  H  on  which  to  base  either  the  stochastic  or  the 
non- stochastic  m.ethod. 

(il)  The  measure  of  approximatio' ^  used  in  the  case  of  the 

iterative  method  will  be  l|j^    -  5^-J|  .   The  measure  of  approxi- 

"00     N 

mation  used  in  the  stochastic  method  will  be  I  ^,   ]  .  -  ^  I  , 

'   00  1     V  ' 
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where  "■■>       ]  .  denotes  the  i-th  component  of  the  solution  vector 
oo  1 

^,   ,  and  Z   is  its  statistical  estimator,  based  on  a  sample 
of  size  V. 

(5)  To  furnish  some  contact  with  the  large  family  of 
alternative  methods  of  solving  ^  =  H^  +  n.  ,  a  simple  direct 
method  of  solution  will  be  brought  into  the  comparison.   For 
siri'plicity,  it  will  be  assumed  that  ti.i  s  direct  method  gives 
the  exact  solution.   Round-off  error  will  be  ignored.   The 
direct  method  v;hich  we  select  is  the  Gauss  Elimination  method, 
because  for  present  purposes  it  seems  to  be  as  good  as  any 
other  and  better  than  most. 

(6)  As  previously  stated  in  Section  2,  the  amount  of  work 
required  for  a  computation  ivill  be  ireasured  only  by  the  nuitiber 
of  multiplications  required  in  the  worst  cases,  counting  a 
reciprocation  or  division  as  one  multiplication.   In  counting 
multiplications,  the  possibility  of  unit  or  zero  factors  is 
not  taken  into  account. 

( 7 )  It  will  be  assumed  that  the  problem  is  to  find  only 
one  com.ponent  of  the  solution  of  <?,  =  H^  +  Y« 

It  is  recognized  freely  that  the  last  restriction  on 
the  comparison  is  a  strange  one.   It  is  made  because  the  ques- 
tion of  efficient  Ilonte  Carlo  estimation  of  all  components  of 
the  solution  simultaneously  has  not  yet  been  adequately 
investigated.   Of  course,   separate  statistically  independent 
estimations  can  be  made  for  each  of   the  n  components  of  the 
solution.   This  would  multiply  the  measure  of  work  which  we 
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shall  derive  for  the  Monte  Carlo  method"  by  a  factor  of  n. 
At  the  S£ijne  time,  for  a  given  sample  size  v,  the  probability 
that  all  estimations  fall  within  preassigned  limits  of  error, 
will  be  smaller  than  it  is  for  the  estimation  of  a  single 
component"".   Therefore  v  should  be  correspondingly  increased. 
But  it  is  almost  surely  inefficient  to  use  separate  independ- 
ent estimators  for  each  component  of   the  solution.   It  seems 
intuitively  clear  that  data  obtained  in  the  course  of  esti- 
mating one  component  should  be  used  again  for  other  components 

With  these  preliminary  comments  out  of  the  way,  we  pro- 
ceed to  set  up  the  Ilonte  Carlo  estimation  of  i^   ]  .  . 

The  standard  method  of  estimating  K   9].,  where  K  =  [k.  .] 

a         -^ 
Is  a  given  2n  x  2n  matrix  and  9  =  {t-.,.,,,t^^)    is/Sn-dimens 

sional  vector,  has  already  been  described  in  Section  1»   Here 

we  recapitulate  it  briefly.   Numbers  z .  .  and  P^  •>  ij  J=l,  «.  •  •jSn, 

are  chosen  such  that  z .  .p .  .  =  k^^^j  with  P^^  •  i  0,  2  p  .  =  1. 

A  Markov  process  with  states  1,2, ...,2n,  and  with  the  matrix 

[p.  .1  as  its  matrix  of  transition  probabilities  is  set  up. 


"That  is,  the  measure  of  the  vjork  for  the  purely  stochastic 
part  of  the  solution.   This  is  represented  by  the  third  quantity 
in  the  siim  on  the  right  side  of  Ck'S),    o^  of  (l4-»6),  in  Section 
L|.,  below. 

""The  question  involved  here  is  that  of  the  distribution  of  the 
extreme  absolute  value  of  n  normally  distributed  independent 
random  variables  with  zero  means  and  differing  variances. 


The  re-use  of  samples  to  estimate  various  components  simul- 
taneously is  discussed  briefly  in  [[|.]  . 
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Let    J   ,    J,,     ...,    J  be    a   family    of    random  variables   which 

represent    the   process,    in  the    sense    that    Pr  ( J,  _j_  =  j  |  J,  =i )    =  p.  . 

1,  j   =  1,  . .  . ,  2n,      The    random  variable   Z-z,    -r'z.^^»,,z^-r        t_ 

J^Ji  J1J2        Vn+1  ^ 

has  the  property  that  E ( Z 1  J  =i )  =  K    S]^» 
Consider  now  the   2n A  2n  matrix 


N+1 


(3.3) 


and  the  vector 


K  = 


H 


0  I  I 


{l'\\.) 


9  = 


Po 


where  His  the  matrix  of  the  equation  <^5=HF,  +  ■',  lis  the 
r).%rv     -unit  matrix,  and  p   is  the  residual  vector  corresponding 
to  the  initial  estimate  "^   of  the  solution  of  these  equations. 
(That  is,  p   =  Y  -  (I-H)  S^*)   Then  it  is  easily  shown  that 


K 


N+1 


H 


o 
N+1 


0 


I+H+...+H 
I 


N 


Therefore, 


K^^^S 


N, 


(I+H+...+H  )p 
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Our  Monte  Carlo  solution  of  the  equations  ^  =  Kf  +   y 
consists  of  statistically  estimating  the  i-th  component  of 
this  vector  K   9,  adding  this  statistical  estimate  to  g  ]., 

JO  '01 

and  with  reference  to  (2.10)  or  (3'1),  using  this  sum  to 

approximate  thereby  the  i-th  component  of  the  solution  vector 

f,   •   More  specifically,  we  set  up  the  numbers  z.  .  and  p.  . 
CO  -^  •■  '  ■"  1  J       1  J 

for  the  special  matrix  K  appearing  in  (3«3)'   For  each  sample 

random  walk,  represented  by  a  determination  of  the  vector 

random  variable  J  ,  J^  ,    ....  J.r ,  t  >  made  with  J  =  i,  we 

o   1        N+1  o 

compute  the  statistic 


"'  n  J  •      ^  ~   "''  J  TT^TT***^TT         T 

U  1        °  1     "^0-^1  '^1-^2  "^N-^N+l   "^N+1  ' 

in  which  t_    is  the  J,.  , -th  component  of  the  vector  9  given 

given  by  (3.i|-)»   The  conditional  mean  value  of  this  statistic, 
given  that  J^  =  i,  is  ?^]^  +  (I+H+...+H  )Pq]^,    or  5jj+2.^i  * 

The  statistical  estimation  of  ^i^,-,  is  accomplished  by 

taking  the  average  of  v  independent  determinations  of  the 

random  variable   5  ].  +  Z,  which  we   denote  by  5  ] ^ +  Z^  , 

o-"  1     '  "^   o  i   1  * 

^  ].  +  Z-,  ...,  t   ].    +  Z   .   This  average  takes  the  form 


Z-i+Zo"*-.  .  .+z 

7   =  c     T   +   1   2 V 


Of  course,  Z   directly  estimates  or  approximates 

?TiT_L-i  ]  •  and  not  the  solution  component  ^^   ]  .  .   But  we  now 
N+1  X  ^       00  -"i 
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eliminate  the  truncation  error  completely  from  consider- 
ation in  the  Monte  Carlo  solution  by  assuming  that  N,  although 
finite,  is  so  large  that  ll'^^— ,  -  ^m+i  II  is  completely  negligible. 
Prom  (2.12)  it  is  obvious  that  this  can  always  be  done.   For 
all  practical  purposes  then,  Z  will  be  an  estimator  directly 


of 


^co^i  =  l^^  ^""^h   =  ^o^i  ^    (I^H+h2-h...)pJ. 

N^oo 


This  is  the  i-th  component  of  the  vector  which  appears  in  (3.2) 

We  shall  now  discuss  the  choice  of  the  numbers z.  .  and 
p,  .  with  refez^ence  to  the  special  matrix  K  now  under  consider- 
ation.  Obviously  it  is  necessary  to  choose  z.  .  and  p.  .  so 

ij     ^ij 

that  z.  .p.  .  =  h.  .,  i.i  =  l,...,n,  and  z.  .,  p.  .,   =  1, 
ij^ij     ij'   '^     )         >    )  i,i+n^i,i+n 

Moreover,  for  i  >  n,  z .  .p .  .  =  1  if  i  =  j  and  otherwise 

J     "J 

z.  .p.  .  =  0. 
ij  ij 

Within  these  limitations,  there  are  of  course  an  infinite 

number  of  -oossible  choices  of  the  numbers  z.  .  and  p.  ..   It 

seems  likely  from  evidence  of  various  types  that  an  optimum 

choice,  or  at  least  a  near-optimum  choice,  in  the  present 

instance  consists  in  letting  p.  .  =  |h. .|,  i,j  =  l,...,n, 

p.  .  =  0,  i  =  l,...,n,  j  =  n+1,  n+2,  ...,  n+i-1,  n+i+1,  ...,  2n, 

•^  n 

^ij  ^  °'  i  >  n,  j  7^  i  and  P^^^+n  ^  "'■  '  .?  ^ij'  ^  ^   l,...,n, 

p.  .  =  1,  i  >  n.  We  must  defer  a  complete  discussion  of  this 
choice  of  the  number  p.  .  to  another  paper  which  is  now  under 
preparation. 
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^  II    11 

It    Is    to   be   noticed   that        Z   p.  .    ^    l|H||<   1,    i    =   l,...,n. 

j=l    ^^ 

VJith  this   choice    of   the   nmnbeisp.  .   it   follows   that 

i,j»  1,  ...,  n,  -J 

z.  .  =  ■+  1  and  z.  .  ,   =  l/p .  -  ,   for  i  =  l,...,n.   We  henceforth 
13    -   <^     1,1+n    '-^iji+n  *    ' 

shall  usually  drop  the  subscript  on  p   and  write  p  =  (r, ,...,r  ) 
Then 


(3.3) 


'"^o'h      h'^2    '"      -^N-^N+l  ""^N+1 


J,T , -,  -  1,2,  ...jn 


N+1 


+ 


J  .. 

N"-1 


^J  .,  ,  n+J 


'      "^N+l  ~  P"*"!*  . . . ,  2n   , 


where-   N"    is   the  "duration"  of  the  random  walk  represented 
by  J  ,  J-,,...;  that  is,  N"  is  the  number  of  times  a  state  in 
the  first  n  statesis  visited  before  any  state  in  the  last  n 
states  is  visited".   (The  present  set-up  of  the  liarkov  process 
makes  the   last  n  states  play  the  rcle  of  absorbing  states.) 


We  shall  here  count  in  the  first  state  -  the  state  from 
which  the  random  v;alk  starts  -  in  computing  N",   Thus  if  [|.  non- 
absorbing  states  are  visited  including  the  starting  point  ajid 
then  absorption  takes  place,  then  N"  =  Ij.,  and  J-  is  the  last 
one  of  the  J's  taking  on  one  of  the  values  1,  2,  ••.,  n,  and 
Jj  is  equal  to  n  +  J^,   This  convention  concerning  N"  is 
adopted  so  as  to  conform  with  the  definition  given  in  Curtiss, 
[3],  and  so  as  to  simplify  later  formulas  slightly. 
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It    is    to   be    noted    at    this    ooint    that 
,     ,  llpil  lip  II 


Z      < 


-  mm  p.    .  ,  n 

1           '                    1    -  max  Z     p .  . 

i  j=l      ^J 

lip  II 


l-tlHiil 

This  means  that  E(Z  |j  =  i)  exists  and  is  uniformly 
bounded  for  all  values  of  N. 

Let  V  denote  the  conditional  variance  of  the  random 
variable  Z,  relative  to  the  hynothesis  that  H  =  i.   This  is 

'  "'  •  o 

a  measure  of  dispersion  of  Z  defined  by  v  =  E-[[Z-E(Z)]  |j  =i}  = 
=  E(Z  I J  =i )  -  (f,  ]^  )  •   It  is  necessary  for  later  develop- 
ments to  obtain  an  appraisal  of  v.   The  explicit  formula  for 
V  is  known",  but  in  the  present  situation  a  rough  metiiod  of 
appraisal  which  bypasses  the  formula  x-jill  give  just  as  good 
a  bound  for  v  as  can  be  obtained  from  the  explicit  formula 
for  V. 

The  rough  method  is  this.   Given  any  random  variable  X 

distributed  on  the  interval  (-a, a),  it  obviously  follows  from 

2       2     2 
the  definition  of  mean  value  that  E(X  )  g  E(a  )  =  a    and 

E(X-E(X))^  ^  a^  -  [E(X)]^.   Thus  the  highest  value  that  the 

2 
variance  of   such  a  random  variable  can  have  is  a  .   But  if 


■"  See  [3],  p.  223. 
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the  random  varirble  has  a  uniform  distribution  on  this  interval, 
direct  computation  reveals  that  the  variance  is  only  a  /3 •   If 
the  distribution  is  somewhat  bell-shaped,  the  variance  may  be 
much  less,  with  zero  as  the  greatest  lower  bound.   Therefore, 
as  a  rough  approximation,  we  shall  in  the  present  instance 
take  the  variance  to  be  not  greater  than  a  /2.   That  is,  our 
appraisal  of  v  will  be 


(3.5)         V  ^  i 


(1  -  llHll 


where   the    right    side   is   obtained  by   referring   to    (3.I4.)", 

Another  appraisal   which  will   be   needed   relates  to  the 
mean  value    of   the   duration  N".      It   is   known ""that 


j=l  "J 


E(N"|j^=i)    =      2    (I+P+P'^+...+P^  ),  ,    . 
Therefore, 


n 

e{n'" 


■"■|j^=i)    ifz    (1+?+?"^+...).  . 
j=l  ^^ 

<  II1II+    l|P|I+    l|P||2    +    ... 


1    -    IIP 


If  the  reader  prefers  to  work  vjith  a  bound  v/iiich  is  one  hundred 
per  cent  certain  not  to  be  exceeded,  he  will  have  to  coriib  through 
the  remaining  calculations  in  this  paper  and  replace  the  factor 
1/2  by  unity  wherever  (3.5)  is  used.   There  are  enough  safety 
factors  in  our  estimates,  insofar  as  avoiding  the  favoring  of 
the  Monte  Carlo  method  is  concerned,  so  that  this  ought  to  be 
unnecessary. 


'See  Curtiss  [3],  p.  226. 
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where  P  =  [p.  .].   This  formula  holds  good  for  any  I'iarkov 
process  with  absorbing  states  and  with  stationary  transition 
probabilities  given  by  a  matrix  such  as  P.   In  the  present 
case,  P  =  [ Ih.  .|],  so  our  upper  bound  for  the  mean  deviation  is 


(3.6)      E(N"|J  =  i)  =  i 

1  -  llHll 


It  should  be  noted  that  (3.6)  becomes  an  equality  if  the 
sura  of  the  elements  of  the  i-th  row  of  P  =  f Ih.  .|]  is  constant 
for  1  =  1, . . . ,n. 

Incidentally,   the  reason  for  using  the  matrix  norm 
max  ZJh.  .|  instead  of  one  of   the  other  norms  should  now  be 

apparent.   The  natural  appraisals  for  both  v  and  the  mean 
duration  seem  to  involve  this  particular  norm* 

The  conditional  variance  of  N",  given  that  J  =  i,  has 
the  folloT-jing  bound   in  the  case  K  =  co  : 

(3.7)    Var(N''''|j  =i)  <  — ^1+E(n'''"|J  =1)1e(N''''j  =i ) 

"1-llpll    '^        0.0 


2 


1-llpll 


-  [E(N''''|j^=i)}2   . 


In  view  of  certain  safety  factors  in  this  formula,  mq    shall 
accept  the  follovdng  simpler  heuristic  appraisal  of  the  vari- 
ance, obtained  by  discarding  the  second  term  in  the  third 

'"see  Curtis s  [3],  p.  226, 
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member  of    O..?)    and  halving  the   first    term: 


(3.8)  Var    (n'""|j  =i)    <  — ^ 


o 


1-llHll 


Thus  our  appraisal  of  the  conditional  variance  of  the  dura- 
tion is  identical  with  our  appraisal  of  the  conditional  mean 
value  of  the  duration. 

In  concluding  this  section,  we  shall  make  two  general 
remarks  about  the  Honte  Carlo  method  for  solving  ^=  B.^-,  +   y 
developed  above. 

In  the  first  place,  one  of  the  desirable  features  of  any 
Monte  Carlo  solution  is  to  achieve  an  arrangement  vhereby 
the  more  that  is  known  about  the  solution  of  the   problem  in 
advance,  the  smaller  the  variance  of  the  statistical  estimator 
is,  with  zero  variance  attained  in  the  presence  of  full  know- 
ledge of   the  solution.   Such  an  arrangement  has  been 
achieved  in  the  present  case.   If  the  solution  is  known  in 
advance,  then  p      =  p   -   0,    and  consequently  v  =  0.   The  inequality 
(3*5)  gives  a  bound  for  v  which  depends  on  the  square  of  the 
norm  of  the  zero-th  residual  vector,  and  thus  the  better  the 
initial  estimate  or  guess  is,  the  smaller  the  variance  is". 

In  the  second  place,  our  Monte  Carlo  solution  has  an 
automatic  self-correction  feature  similar  to  that  of  the 


Another  minimum  variance  Monte  Carlo  solution  of  the  problem 
A^  =  n.  is  presented  in  Curtiss  [3],  pp.  227-231.   The  present 
arrangement  seems  to  be  simpler  and  somewhat  easier  to  use  in 
practice . 
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iterative  method  based  on  (2.I4.).   If  an  error  is  made  in 
computing  the  Z  for  any  one  sample  walk,  this  erroneous  Z 
merely  is  incorporated  into  the  average  of  a  great  many  other 
realizations  of  the  same  random  variable,  and  its  effect  will 
ordinarily  be  negligible. 

l\..    The  a  priori  estirr.ation  of  the  required  amount  of  work* 
In  the  fourth  of  the  ground  rules  stated  at  the  beginning  of 
Section  3j  we  announced  that  the  measure  of  approximation  to 
be  used  in  the  case  of  the  iterative  method  vjould  be 

11?-    -  ?,tII  >    and  in  the  case  of  the  stochastic  method,  it 

'  CO  N  ' 

would  be  I  ;%  ]  .  -  Z  I .   We  shall  now  state  more  explicitly 

CO  IV  i-         J 

just  hov/  W8  are  g'^ing  to  use  these  measures  of  approximation. 

The  general  idea  is  that  in  each  cace,  the  computation 

is  to  proceed  until  the  approximate  solution  is  suitably 

close  to  the  exact  solution,  and  the  definition  of  "suitably 

close"  used  in  each  case  will  be  comparable.   Specifically, 

given   a   small   nuraber  d   >  0,    we   propose    that   the    iterations 

in  the  non-ctochactic  method  shall  be  carried  on  until  finally 

11"'-'         -  HfTll  <  d,    and  vjq   propose    that   the    sampling   of   the   Markov 

process    shall   be    continued  until   finally    1^      ] .    -  ^    I    <  d. 
■^  .  "^        ^^  00    1  V 

But    the    vector     K        is   unknown.      We   must   therefore   trans- 

co 

late  our  measures  of  error  into  terms  of  the  data  of  the 

problem  and  the  initial  esbimate  ?,  .   To  achieve  a  theoretical 

o  

rather  than  empirical  comparison,  we  shall  restrict  ourselves 
entirely  to  an  a  priori  error  analysis. 
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The    error    analysis    and    consequent    appraisal    of   the    amount 
of   work   required   to    achieve    a  given   accuracy.,    is    of  necessity- 
carried   out    very    differently   in  the    case    of    the   two   methods. 
In  the    case    of    the   non-stochastic   raethod,    we    base    the    analysis 
en  the    inequality    1,2.13).      With   an   eye    on  this    inequality,    we 
seek   the    lowest   value    of   N   such  that 


llHll 


N 


1    -      H 


lip  II     <   d      . 


Taking  logarithms  of  both  sides,  we  find  that  the  required 
value  of  N  is 


ik'l) 


N      =   1    + 
o 


log  iSaT  +    1°S(1    -   llHll    ) 


ilpti 


log    II  lill 


where    the    square   bracket   here    T^eans    "largest    Integer   in". 
(The    logarithiiis    can   be    ta':en   to    any    convenient   base,    as    for 
example,    10. ) 

We  must  therefore  carry  out  N   iterations  of  the  recursion 

formula  ^^t^-,  =  H;^  +  y   As  pointed  out  at  the  end  of  Section  2, 

2 
each  iteration  counts  as  n  multiplications.   However,  since 

we  have  set  ourselves  the  peculiar  problem  of  finding  only  one 

component  of  the  solution  vector,  the  last  iteration  (in  wMch 

^.,   is  computed  from  ^-,  _-,  )  can  be  abbreviated  to  just  n 
o  '  o~ 

multiplications.   The  formula  for  A      involves  ||p  ||  ,  and  it 
seems  reasonable  to  suppose  therefore  that  in  using  this  error 
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analysis  in  practice,  p   ~  P      would  always  be  computed  at  the 

2 

outset.   This  would  take  n  more  multiplications.   Thus  the 

grand  total  of  the  number  of  multiplications  required  a  priori 
to  achieve  the  inequality  ll-'y  „  *  ^'W^'  "^  '^  ^^ 


ik'?-)  rn  =  (N^-l)n^  +  n  +  n^ 


2        2 

=  n  +  n  +  n 


log  j^  +  iog(i>  IIhH) 
log  IIh  II 


We  now  attack  the  analogous  problem  for  the  stochastic 
method. 

The  statistical  estimator  Z   is  given  by  the  formula 

"v  -  -oil  * ; . 

where  Z-,,  Z^,    ...,  Z   are  v  mutually  independent  determinations 
of  the  random  variable  which  appears  in  the  right  member  of 
(3»3)»   The  mean  value  of  Z   is  ^'■'^]-    for  all  practical 
purposes.   It  will  not  be  possible  to  adjust  v  so  as  to 
assure  ourselves  a  priori,  given  any  d  >  0  however  small, 
that  Z   will  deviate  from  its  mean  value  by  less  than  d. 
We  must  therefore  have  recourse  to  the  theory  of  statistical 
estimation. 

Probably  the  easiest  way  to  approach  the  question  is  to 
demand  that  a  priori,  the  probaoility  of  a  deviation  of  less 
than  d  shall  be  at  or  above  some  predetermined  rather  high 
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level.      Specifically,    we    choose    a    small    number  p,    and   require 
that    V    shall  be    taken   as   the    lowest   value   for  which,    a  priori, 
the   folloxijing   ine4.uality  holds: 


Pr(|r,^]^    -   Z^l<d)    >   1    -   2p      . 


Now  Z  is  a  constant  plus  the  average  of  v  independent, 
identically  distributed  random  variables,  each  with  a  finite 
variance  v.   It  therefore  follows  from  a  well-knovm.  result  in 

probability  theory  called  the  Central  Limit  Theorem"  that 

—      —     1/2 
(Z   -  E(Z  ) )/v  '  is  approximately  distributed  according 

to  the  normal  or  Gaussian  distribution,  where  v  denotes  the 

'         V 

conditional  variance  of  Z  ,  given  that  J  =  i.   The  approxi- 
mation is  ordinarily  very  good  for  v  >  100,  and  in  all  of  our 
subsequent  applications  of  this  theorem  we  shall  be  dealing 
v;ith  values  of  v  much  greater  than  this.   At  v/orst,  the  effect 
of  a  poor  approximation  wo  ild  be  merely  to  deceive  us  by  a 
few  one  hundredths  as  to  the  value  of  the  probability  level  p 
which  is  really  in  effect. 

The  variance  of  a  constant  plus  a  random  variable  is  the 
same  as  the  variance  of  the  random  variable  alone.   Therefore 

the  variance  v   of  Z   is   equal  to  that  of  the  average  of  v 
V      V       ^  ^ 

independent  determinations  of  the  random  variable  Z.   A 


'"'see  e.g.  [2],  Chap.  1? . 
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familiar  formula  of  statistical  theory'  then  states  that 
V  =  v/v,  where  as  in  Section  3,  v  is  the  conditional  vari- 
ance of  Z,  given  that  J  =  i. 

Putting  the  above  facts  together,  we  have: 


Pr(  [',      ],  -  Z  I  <  d) 
^  '  -^co  i    V  ' 


P^^  T/T-  "   -172, 

1e(Z  )  -  Z  I  .  1/2  \ 

n   ,    ^   V        V '  ^  dV  '     ^ 

^^  '  1/2 <  —172 

%  V 


,  1/2  /  1/2          , 

,  dv  "^   /  V  '      T     ,  2 /„ 

/'  4:::  e"^  /^  dt 

^   ,  ,1/2  /  1/2    N2tt 

-dv  '       /  ^  ' 


1-2  — ^  e    '   dt  . 


At  this  point  v/e  shall  raake  an  arbitrary  decision  about 
the  level  of  certainty  1  -  2p  which  is  to  be  demanded.   If 
X  =  2,  then 

00  ^ 


X 


/    -~   e"^  /2  dt  =  .01^55 


A  level  of  certainty  equal  to  1  -  0.0[(.55  -  »9SkS  seems  more 
than  adequate  for  present  purposes,  in  view  of  various  other 
safety  factors  which  are  embodied  in  our  appraisals. 

See  e.g.  [2],  p.  3hS» 
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Thus  we  are  seeking  the  S-:aile3t  value  of  v  such  that 


dv 


1/2 


V 


1/2 


>  2  . 


This  value  of  v  is 


(1^.3) 


.2 


+  1 


where  again  [  ]  means  "largest  integer  in". 

This  is  the  number  of  independent  sample  realizations  of 
the  Markov  process  needed  to  achieve  the  demanded  order  of 
accuracy  with  a  probability/  of  at  least  95 '^5  h   • 

To  estimate  the  aiaount  of  x^rork  required  to  attain  this 

level  of  accuracy,  we  must  mal-:e  some  further  ass\iinptions  as  to 

how  the  Monte  Carlo  coicputations  v;lll  be  cameu  out.   From 

(3«3),  we  see  that  each  sai.ple  will  (alriiOst  surely)  involve 

computing  some  one  of  the  numbsr  r./p.  . ,  ,  i  =  l,...,n.   It 
°  11, i+n  ' 

seems  logical  therefore  to  assume  that  the s^  quantities  will 

2 

all  be  calculated  in  advance.   It  v;ill  require  n  multiplications 

to  compute  p  ,  given  %    ,    and  thereafter  it  xirill  require 

of  n  multinlications  to  get  the  ..uotients  r./p,  .   . 

'^  1  -^  1,1+n 

V/e  m.ust  now  come  to  an  c.greement  as  to  how  r.mch,  work  is 
involved  in  following  each  random  walk  J  , J, , . . .  to  absorption. 
It  seems  to  be  not  unreasonable  to  assume  that  each  step  before 
absorption,  and  the  step  in  which  absorption  takes  place,  always 
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requires  the  equivalent  c.f  one  niulti  ijlicition .   Of  course, 

after  absorption  takes  place  in  one  of  the  states  numbered 

n+1,  n+2,  ,..,  2n,  no  more  computations  are  required  for  the 

particular  realization  of  the  Markov  process  at  hand,  and  a  new 

independent  sample  is  started.   In  other  words,   each  complete 

random  walk,  represented  by  J  ^  J-,  ^  Jp  ^  . .  .  ^  J  ,;*   »•  J  „  = 

°    -^    ^  N"-l    N" 

=  J  .,    +  n,  will  require  N"  multiplications. 
N'~-1 

In  ascribing  to  each  step  the  equivalent  of  one  multipli- 
cation, we  have  in  mind  the  fact  that  to  select  the  value  of 
J,  , -,  ,  given  J,  ,  a  pseudo-random  number  will  presumably  be 
generated  and  certain  comparison  operations  will  have  to  be 
performed. 

Putting  these  assuriiptions  together,  i-je   find  that  the  total 
amount  of  work  required,  measured  in  multiplications,  is 


ik.k)  n^  +  n  +  Ij"  +  n"  +  ...  +  n"' 


where  11^,    ...,  N^  are  v   independent  determinations  of  the 

random  variable  N"  introduced  in  Section  3.   This  is  a  random 

variable  whose  (conditional)  mean  value  is  n   +  n  +  v  E(N"|J  =i ) 

o     '  o 

and  whose  (conditional)  variance  is  v  Var(N"|j  =i ) . 

o       '  o 

In  Section  3  (e«p>  3*7)  we  arrived  at  an  api:)raisal  of  the 
magnitude  of  Var(K'  |  J  =i )  wliich  vras  juct  the  same  as  our 
appraisal  of  the  ma'jnitude  of  E(N"|J  =i ) .   Interpreted  very 
broadly,  and  giving  our  appraisals  more  credit  for  sharpness 
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than  they  probably  deserve,  this  means  that  if  the  same  problem, 
\  =   Hf.  +  Y»  were  solved  over  and  over  by  the  i-Ionte  Carlo  method, 
the  amount  of  x<rork  could  b^^  expected  to  exhibit  statistical 
fluctuations  around  its  mean,  of  a  magnitude  as  great  as  some- 
thine;  like  2  or  3  times  the  square  root  of  v  Edl'lj  =i ) . 

o  -^  -1  O         0 

Qualitatively  speai<:ing,  v;e  can  state  with  some  assurance  that 

the  amount  of  work  required  to  achieve  a  given  accuracy  would 

vary  greatly  from  trial  to  trial  if  the  solution  by  Monte  Carlo 

methods  v:ere  to  be  carried  out  over  and  over.   Due  to  the 

effect  of  the  Central  Limit  Theorem^  as  applied  to  (I|..l-(.)  if  the 

Monte  Carlo  solution  v;ere  to  be  carried  out  over  and  over  again, 

about  half  of  the  time  the  total  amount  of  i-;ork  (excluding 

preliminary  preparations)  would  be  less  than  v  E(N"1j  =i), 

and  about  half  the  time  it  would  be  more  than  this  quantity. 

It  would  practically  never  be  more  than  v  E(N"|j  =i)  + 
+  3{v^E(ir|j^=i)}l/2  . 

These  statistical  fluctuations  of  the  amount  of  work 
constitute  just  one  more  obstacle  to  a  comparison  between 
stochastic  and  non-stochastic  methods  of  solution  of  linear 
equations.   It  seems  logical,  ho^^rever,  to  settle  on  the  mean 
value  of  the  random  variable  in  ([|-.[|.)  as  the  most  suitable 
representative  o±    the  amount  of  ^^^ork  in  the  stochastic  method 
to  use  for  comparison  purposes,  and  this  vie    shall  do. 

Our  formula  for  the  mean  number  of  multiplications  required 

a  -nriori  to  acsure  that  \y      .  -  Z  |  <  d  is  thus 
-  ' '  -OO  1     V 


38. 


(1+.5) 


n  +  n  + 


d2 


+  1)  E(N'""|j^=i) 


Into  this,  we  substitute  the  bounds  given  by  (3*5)  and  (3.6), 
wliioh  are  in  terms  of  the  data  of  the  problem.   V/e  finally 
arrive  at  the  formula 


([l.6)        ra  =  n^  +  n  +  <1   + 

1-llHil 


2   o 


Ld^d-llHlD^J 


which  is  our  basic  estimate  of  the  (mean)  amount  of  work  re- 
quired in  the  Monte  Carlo  method. 

5*  Numerical  com.parisons «   It  uill  j^ow  be  convenient  to  express 
the  number  d  (which  measures  the  desired  closeness  of  approxi- 
mation) as  a  percentage  or  fraction  of  the  norm  of  the  initial 
residual  vector  p  =  p  .   That  is  because  ||p  |1  and  d  appear  in 
our  formulas  for  amount  of  v7ork  only  in  a  ratio.   Thus  we  let 


r  llpll  ,   r  >  0. 


The  goal  of  the  non-stochastic  iterative  method  can  now  be 


-^.T 


until  it  becomes  less 


phrased  as  beinr-  to  reduce  11^     ^.,^ 

^  '  ^00    -N 

than  some  suitably  small  multiple  r  of  the  largest  element  (in 

absolute  value)  of  the  initial  residual  vector.   The  goal  of 

the  stochastic  method  will  be  to  reduce  |^j   ]   -  ~  |  by  repeated 

i 
sampling  until  with  hi^;h  probabil_ty  ( specif  ically,  a  probability 

of  95  yo  ),  it  too  becomes  less  than  the  ssitie  small  multiple  r 
of  the  largest  element  of  p  =  p  . 


39. 


With  this  agreement,  we    i-'e capitulate  our  basic  formulas 
for  msasurinfj,  the  amount  of  worl:.   The  upper  bound  for  the 
amount  of  work  required  in  the  non-stochastic  iterative  method 
is 


(5.1) 


2        2 
m  =  n  +  n  +  n 


where  h  -  ii  Hj] 


log  r  +  lop;(l-h) 
log  h 


The  upper  bound  for  the   mean  ai-aount  of  v.-ork  required  in  the 
stochastic  method  is 


(5.2) 


m  - 


-   4.       O.     1 

n  +  n  + 


r 


l-h 


1  + 


r^(l-h)^ 


In  each  forinula,  the  square  bracliets  mean  "largest  integer  in" 

The  quantity  in  braces  in  (5.2)  represents  v  ,  the  total 
n-omber  of  times  the  I-Iarkov  process  must  be  sariipled. 

Perhaps  a  more  natural  formulation  of  the  goals  of  the 
iterative  riiethod  and  the  Monte  Carlo  method  from  the  purely 
theoretical  point  of  view  would  be  obtained  if  instead  of 
requiring  that  the  inequalities  W^  ^    -  ^-mH  <  ^11  p  II  and 
1?:   ]   ~  2  I  <  r  lip  II  shall  hold  (the  latter  v/ith  high  proba- 

bility),  we  required  that  the  inequalities  |1^   -  '^, 


'N 


<  r'  II?    -  £  II  and 

'  -  CO     -  0 


I  <  r.|l<^ 


-  r 


II  shall  hold 


CO  1     V  '     ■   "  '^  CO     '  o 
for  acme  specified  r'  >  0,   These  iiiodified  requirements  lead 

to  simpler  esti^iiates  for  the  total  ainount  of  x-/ork.   Proceeding 

In  the  spirit  of  our  previous  analysis,  v;e  use  the  relation 


it-0, 


ll?c.    -5o 


-1 


(I-H)-^    Poll  ^ 


II  Po  II 


1-h 


and  rephrase  the  new  requirements  as  follows: 


'  ^00  -^1    V        1-h 


Substituting   the    right-hand   members    of   these    inequalities    for 
d   in    (.'4.. 2)    and    (k.6)    respectively,    we    get 


m' 


2  2 

n     +  n  +  n 


log   r' 


log  h 


and 


-  -     2    ,         ^     1 

m  -  n     +  n  + 


1-h 


1   + 


i 


Of  course  in  practice,  \\  c  ^  -   E,  II  is  itself  not  computable 
before  the  solution  is  l:no'wn,  so  the  new  requirements  villi 
always  have  to  be  trrjislated  into  terms  of  ||  p  i|  and  h,  just 
as  they  were  in  the  above  theoretical  error  analysis.   This 
essentially  reduces  the  new  set  of  requirement s  to  the  old  ones, 
with  an  intermediate  appraisal  thro'.-jn  into  the  picture.   There- 
fore at  the  e;:pense  of  a  slight  complication  in  our  formulas, 
we  choose  to  assur.ie  that  the  required  degree  of  approximation 
is  expressed  in  terms  of   a  m.ultiple  of  the  computable  quantity 
II  p^  II  rather  than  in  terms  of  a  multiple  of  the  non- computable 
quantity  ||  t        "  <^^ll  . 


CO 


1;1. 


In  addition  to  the  iterative  and  Monte  Carlo  methods  of 
solving  ^=  H^  +  Y>  we  promised  in  the  ground  rules  in  Section  3 
that  a  non-iterative  direct  method  will  be  brought  into  the 
comparison  as  a   sort  of  standard  of  reference.   The  method  we 
propose  to  consider  is  the  Gauss  Elimination  Method".   It 
seems  to  be  the  particular  direct  method  best  adapted  to  the 
peculiar  problem  to  which  we  have  addressed  ourselves;  namely, 
that  of  computing  just  one  component  of  the  solution  vector. 

To  apply  it,  wo  might  proceed  as  follows:  We  are  seeking 
L      ] • »  for  some  fixed  i  .   Permute  the  columns  of  I-H  and  the 
components   of  ^  so  that  the  i-th  colmnn  of  I-H  becomes  the  n-th 
one  and  the  i-th  component  of  "i^  becomes  the  n-th  one.   Tri- 
angularize  the  (nev;)  matrix  I-H  as  in  the  first  part  of  the 
Gauss  elimination  method,  always  using  leading  row  elements  as 
pivots.   At  the  end  of  the  triangularization  procedure,  which 
requires  approximately  n- /3  +  n  multiplications"'",  the  co- 
efficient of  the  desired  element  is  sitting  out  in  the  open, 
so  to  speak,  at  a  vertex  of  a  triangular  array,  with  nothing 
but  zeros  for  the  other  terms  in  its   row.   Of  course  at  the 
same  time  y  niust  be  suitably  transformed. 

It  would  require  only  about  (l/2)n  more  multiplications 
novi   to  get  the  rest  of  the  components  of  the  solution,  but  for 

"See  for  example  Dwyer  [5],  Section  S.l].. 

I'D? 

""The  exact  coimt  depends  on  the  order  in  which  the  arithmetic 
operations  are  carried  out. 


1^2. 


present  purposes  we  irnore  the  fact  that  a  complete  solution 
would  lie  so  near  at  hand  at  this  point. 

The  elimination  solution,  as  we  said  in  Section  3,  is 
assumed  to  be  exact.   No  questions  of  approximation  (which  for 
large  matrices  in  practice  w:.ll  indeed  arise  because  of  round- 
off error)  will  be  considered  here. 

As  indicated  above,  our  formula  for  the  amount  of  vjork  in 
the  direct  solution  is   then 


(5.3)  m  =  ^  +  n2 


Nov:   the  key  to  our  vjhole  comparison  is  that  {S'2>)    is   a 
third  degree  polynomial  in  the  order  n  of  A  and  of  H,  and  (5'1) 
is  a  second  degree  polynomial  in  the  order  n  of  A  and  H,  and 
the  term  in  m  in  (5»2)  which  represents  the  strictly  stochastic 
part  of  the  tjolution  is  independent  of  the  order  of  A  and  H. 

It  follovrs  that  x-jith  reasonable  values  of  h  and  r,  the 
direct  method  will  be  more  econoriiical  for  small  values  of  n, 
the  non-stochastic  iterative  method  for  intermediate  values  of 
n,  and  the  Ilonte  Carlo  method  for  large  values  of  n.   The 
formulas  foi'   the  break-even  points,  obtained  by  equating  our 
estimates  for  the  amount  of  work,  are  as  follows: 

The  amount  of  i;ork  for  the  Gauss  elimination  method,  as 
estim.ated  by  {S'?>)i    ^"-   less  than  thr.t  for  the  stationary  linear 
iterative  method,  as  estimated  by  (p.l),  for  values  of  n  in 
the  interval 


1+3 . 


(54) 


1  <  n  < 


3a  +  (9a'''-  +  12)^^^ 


whe  re 


a  = 


log  r  +  lop;  (1-h) 


log  h. 


It  is  greater  th.ia'i  that  for  either  the  linear  iterative  method 
or  the  iionte  Carlo  method  for  valuer;  of  n  exceeding  the  right 
member  of  (^.'l-)  • 

The  air.ount  of  vjork,  ac  estimated  by  (5«1),  for  the  station* 
ary  linear  iterative  method  is  less  than  the  m.ean  amount  of 
work  for  the  Iionte  Carlo  method,  as  estimated  by  (5«2),  for 
values  of  n  in  the  interval 


(5.5) 


1  <  n  <   {^)V2 


where 


f 


l-h 


1   + 


}^ 

r'^d-h)!] 


It    is   greater   than  tjie   m.ean   a   ount    of   V70rk   for   the    Monte    Carlo 
method  for  values   of  n  exceedizig  the   right   member  of    (5»5)» 

In  the    accompanying   table,    vre   list   the   numerical   values   of 
these    limJ-ts,    togetlier   with    some    related  qufintities,    for   various 
typical    valuer,   of  r   and  h.      In  one    case    -   that    in  v/hich  h  =  9/lO, 
r   =   l/lO    -   the    linear   iterative   riethod    alvjays   requires    (by   our 
a  priori    estimates)    more   laultiplications    than   some    one    or   both 
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of  the  other  two  methods.   (I'/e  are  referring  to  the  mean  amount 
as  usual  in  the  case  of  the  Monte  Carlo  method.)   The  break- 
even dimensionality  for  the  Monte  Carlo  method  x-jas  computed 
in  this  case  by  equating  (^.1)  and  (5»3)» 

It  is  important  to  notice  that  the  measure  of  work  for  the 

2  2 

Monte  Carlo  method  vri.ll  increase  only  as  n  ,  and  not  as  an  ,  a^l. 

p 
The  term  n"  in  (5*2)  represents  the  work  required  to  prepare 

the  vector  p   before  the  stochastic  estimation  procedure  is 

begun.   If  one  is  willin<-^  to  content  oneself  with  Z,      =  0  as 

the  initial  estima te,  then  no  multiplications  vjhatever  are 

needed  to  find  p  =  p  ,  and  the  term  n  in  (5'2)  drops  out« 

Under  the    circumstances,    the   total  mean  amount    of  vjork   required 

by  the    Honto    Carlo  method   increases   only   as   the    first   povjer   of 

n.      If  we    also   decide   not   to   calculate  the   numbers   r./p.    .  , 

1.'  -^1,  i+n 

In  advance,  but  only  as  needed  in  the  sampling,  then  all  direct 
formal  dependence  of  the  mean  ainount  of  work  in  the  Monte  Carlo 
method  on  n  disappears. 

The  reader  should  be  warned  not  to  try  to  check  the  table 
for  consistency  by  assum.ing  that  tvjo  stages  of  a  reduction  in 
the  magnitude  of  \[i,      -   ^-,7 II  by  an  amount  r||  p  |i  ,  using  the 
approximate  solution  of  ^  =  li%   +  y  obtained  in  the  first  stage 
as  the  ^  for   the  second  stage,, should  theoretically  require 
just  the  same  amount  of  vjork  as  a  one -stage  reduction  in 
11'^    -  '^o-KiW     t>y  an  amount  of  r  II  f^  II  •   Let  N  be  the  number 
of  iterations  required  by  the  first  stage.   The  methods  used 


^6, 

to   compute   the    table   would  place    N      at    the    smallest    value    com- 
patible  vjith 

N 
(5.6)  T^^   ''      • 


If  N'  is  the  number  of  iterations  required  to  effect  a  one-stage 

O  J.  o 

reduction  in  |1^,   -  ^;^  1|  by  an  amount  r  1|  p  |1 ,  the  methods  used 

to  compute  the  table  would  place  N'  at  the  smallest  value  com- 
^  ■       o 

patlble  with 

<  r 


h  °     2 


1-h 


This  inequality  is  the  same  as  the  follox^^.ng  one: 


nV2 


l/? 
Since    (1-h)    ' '^'   <   1,    it    follovjs   by    comparison  vjith    (5»6)    that 

n'/2  <  rl    . 
o  o 

It  should  also  be  pointed  out  that  to  ps-rform  a  liOnte 
Carlo  approximation :.  in  tv;o  stages  would  require  that  all  com- 
ponents of  the  solution  vector  must  be  estimated  in  the  first 
stage,  and  not  just  one  compoiient.   The  reason  is  that  to  set  up 
the  random  variable  Z  (see  (3.3))  for  the  second  stage  of  esti- 
mation, all  the  components  of  the  initial  residual  vector  for 
this  stage  miust  je  available. 


14-7. 


6.  An  analo,n:oiis  coi^iparlson  for  matrix  inversion.   If  the  problem 
is  to  c-olve  AX  =  I,  whei-e  A  is  a  ^ivcn  n  k  n  matrix,  a  suitable 
modification  of  (3«2)  on  which  to  construct  a  Monte  Carlo  solu- 
tion is  as  follows: 


(6.1)       X^  =  a"-^  =  X^  +  (I+H  +H^+...)  H  X   , 
00  o        o   o        o  o  ' 


where   X     is   an  initial   estimate    of  A~      and   H     =   I    -  X  A«      If 

o  0  0 

X      is    a   reasonably  ^ood   estimate    of   A      ,    then    |1h     ||    <   1,    and 

the    Infinite    series   in    (6.1)    convern;es.      VJe    assiirae   that     IIh    1I<   1 

~-  o 

throughout  the  remainder  of  t],  is  section. 

vie    set  up  the  numbers  z.  .  and  p.  .  in  terms  of  the  elements 
of  H   exactly  as  in  Section  3«   Assuming  that  vje .  are  trying  to 
approximate  the  (i,k)-th  element  of  X   ,  we  take  as  the  p  of 
formula  (3-3  )>  the  k-th  colum:-i  of  K  X  ♦   The  statistical  esti- 
mator will  now  be 


Z-,  +Z^+  • .  •  +z 

Z  =£,].+  ^-^ ^ 

V     o'l 


where  '(^   is  the  k-th  column  of  X  . 
-'  o  o 

The  stationary  linear  iterative  process  whicli  corresponds 
to  (6.1)  is  given  by  tjie  recursion  formula 


(6.2)  X-^^^  =  E^^   +  X^,   N  =  0,1,..., 


where  X,-  is  the  N-th  approximation  to  A   .   Obviously 


I|.8. 


A"-^  =  H  a"-"-  +  X  ,  so 
o       o 


^^"^  -  %  =  ^^e^^"^  -  hAf    '"    =   «o(A"^  -  ^o^  ' 


This  is  the  analogue  of  (2.3').   Since  A""^  -  X  =  (I-H)"-^HX  , 

0  0 

v;e  find  that 


(6.3)      A-1  -  x^^  =  P^^il'%)'\x^     . 


This  equation  is  the  analogue  of  (2.12). 

If  we   let  '^„  denote  the  k-th  column  of  X,,,  N  =  0,1,..., 
then  the  iterations  defined  by  (6.2)  give  the  folloviing  sequence 
of  approximation  to  .^   ,  the  k-th  column  of  A"  : 

00  ' 
(G.k)  :?,  j^^l  =  H^  ^j  +1^    ,   N  =  0,1,... 

Also,  from  (6.3), 

where  p  is  the  k-th  column  of  H  X  .   This  equation  is  formally 
identical  vd  th  (2.12).   Moreover,  from  it  we  find  that 

II  H  11" 

(^•5)         iif.co  -^Hi'^irt^  lip  II  • 

which  is  the  same  as  (2.13), 


1+9* 


If  we  now  define  our  problem  as  that  of  insuring  that 
11':^-^  -  'E.,   II  <  d  in  the  non-stochastic  method,  and 

CD        fj 

|H„  ]4  -  Z  I  <  d  in  the  stochastic  method,  where  d  >  0  is 
CD  1     V  ' 

preassigned,  then  the  a  priori  error  analyses  become  precisely 

the  same  as  those  given  in  Section  I4..   It  requires  n-^  multi- 

2 

plications  to  set  up  H  ,  given  X  ,  and  then  n  more  to  find 

p  =  H   .   However,  H^  will  be  used  again  in  the  non-stochastic 

method  to  pass  from.  E      to  ^,  ,   The  resulting  formula  for  the 

^o       1  '-^ 

total  amo;int  of  work,  including  the  preparatory  work,  becomes 
in  the  non-stochastic  case. 


(6.6) 


m 


n^  + 


n  +  n 


lop:  r  +  log  (1-h) 
log  h 


where   h  =    ||  H     H    and   r   =  d/  ||p||  •      In   the    stochastic    case    it 
becomes 


(6.7) 


m 


3  2 

=  n     +  n     +  n 


+ 


1-h 


1   + 


r^d-h)^ 


1: 


K 


J 


The  break-even  point  for  the  tv;c  methods  is  given  by  the  formula 
|b/(a-l)|-     ,   vjhere  a  and  b  have  the  same  meaning  as  in 
Section  5«   For  values  of  n  lesc  than  this  quantity,  the  non- 
stochastic  method  requires  less  work  than  the  stochastic  method, 
and  for  values  of  n  greater  than  this  quantity,  the  stochastic 
method  requiretj  less  work  than  the  non-stochastic  method. 


y- 
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In  comparing  these  formulas  vd.th  (5.1),  iS'2.)    and  (5 '5) 
it  should  be  remembered  that  in  arriving  at  the  earlier  work- 
estimates  (5'1)  and  (5 '2)  for  the  problem  A^  ~  ^  t    ^^^    assumed 
that  the  H  and  the  y  in  the  equivalent  form  %  =  Ef,     +  y  were 
given,  and  so  we  did  not  count  in  i^jork  required  to  find  them. 
Here  we  did  count  in  the  vjork  required  to  find  our  H  (denoted 
here  by  H  ) ,   (The  vector  y  is  here  K     ,    and  it  com.es  free, 
so  to  speak.)   The  methods  have  therefore  become  nominally 
unfavorable  as  coriipared  to  tlie  Gauss  elimination  method,  which 

for  the  present  problem  (finding  one  component  of  the  solution 

v/here     e\r   is    the   k-th    column   of    I.     ^  p 

of  AE,     =   e,  )    vjould   req  lire   rather   less   than  n  /3   +  n     multi- 

plications. 

We    can    sidestep    the    n-^   multlDlications    reqaired   to    eret    H   , 

^      ^  °    o* 

by  taking  X  as  a  very  simple  m.atrix  (maybe  even  X  =  I  if 
II  I-A  ||<  1).   But  vje  shoi.ild  state  here  that  the  real  motivation 
for  using  a  linear  iterative  method,  or  one  of  the  many  ortho- 
gonalization  and  gradient  -methods,  for  the  problem  A^  =  \7  ,  or 
the  problem  AX  =  I,  in  place  of  a  straightforward  elimination 
method,  usually  does  not  lie  in  a  theoretical  count  of  the 
number  of  operations  required  in  the  v;orst  cases.   It  lies  in 
the  fact  that  A  may  have  special  properties  (e.g.,  symmetry,  or 
many  zeros)  which  are  not  suitably  exploited  by  the  elimination 
methods.   VJe  are  corrpletely  ignoring  such  consider'ations  through- 
out this  study.   Another  motivation   sometimes  is  presented  by 
the  necessity  of  controlling  round-off  error.   (The  Monte  Carlo 
method  looks  very  good  from  this  standpoint.) 
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It   would   alr-o   be    possible    to   construct    a   iionte    Carlo 
solution   on  the    fcllovjing   i-earrangenent   of    (6.1): 


X_    =  A 

00 


-1 


-t2 


I-.H^+H-+...)X^      . 


The    vector   forin  of   this   equation  Is 


00 


(I    -H    H       +    H^    +     ...)4n> 


where    'S         and  H      have   the    sasne   meaning   as    before.      This  pro- 
'  00  ^  o  ° 

cedure   would   avoid  the  necessity   of    calculatin.c!;  Hg       in  advance, 

and   so    the   n     term  vjould  drop   out    of    (6.7).      The   numbers   z.  . 

ij 

and  p.  .,  and  the  random  variables  J  ,  J.,,...   would  be  set  up 
1  j'  o'   1'  -^ 

as  in  Section  3j  t)ut  in  the  random  variable  Z,  the  components 
of  p   vjould  be  replaced  by  those  of  ^  ,  and  the  estimator 
Z    ,    ^    ].    T-;ould  be  replaced  by  zero.   V/ith  these  changes,  the 
estimate  (3«.>)  o£   the  variance  of  Z  becomes 

1    II  5  JJ^ 

V  < 

-  ^  (1  -  II  H  11)2 


and  formula  (I4..6)  for  the  mean  amount  of  work  (now  augmented  by 

the    calculation   of   H      but   decreased   by  the    amoujit    of   i^/ork 

o  ■^ 

previously  necessary   to    calculate  p)    becomes 


m 


3 

=  n^    +  n 


1-11  H 


1    + 


2iiy,jr 

d^d-llH     11, 
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The  disadvantage  of  this  arran[;eriient  is  that  it  does  not 
exploit  the  fact  that  v  varies  with  the  square  of  the  norm  of 
whatever  vector  is  playing  the  role  of  the  vector  0  of  Section 
3,   Therefore  the  goodness  of  the  initial  estimate  is  here 
made  use  of  to  reduce  the  statistical  fluctuations  and  conse- 
quent mean  amount  of  work  only  through  the  effect  it  has  on  the 
value  of  1/(1-11  H  II  )  . 

These  remarks  suggest  a  more  general  comment  which  is 
perhaps  the  key  to  all  the  developments  in  this  paper.   The 
statistical-  part  of  the  ajnount  of  vjorh  required  by  the  Monta 
Carlo  metiiod  to  achieve  a  given  accuracy  in  computing  one  element 
of  a  solution,  is  independent  of  the  dimensionality  of  the 
problem.   Other  known  methods  vai^^r  as  the  square  and  cube  of 
the  dimensionality,  and  tj  ose  which  vary  as  the  square  do  so 
with  a  proportionality  constant  much  larger  than  unity.   There- 
fore if  one  uses  the  Ilonte  Carlo  met:  od,  one  can  afford  to 
make  substantial  prelindnary  preparations,  involving  an  amount 
of  work  which  varies  even  with  the  square  of  the  dimension- 
ality, if  these  preparations  x-jill  substantially  cut  down  the 
error  in  the  subsequent  statistical  estimation  procedure. 

For  the  sa]:.e  of  completeness,  we  shall  bring  into  the 
comparison  a  certain  class  of  non-linear  Iterative  processes 
for  com.puting  A   wl^lch  theoretically  converge  much  faster  than 
the  linear  iterative  process  (o.2)  for  a  given  initial  estimate 
X  .   A  typical  member  of  the  class  is  defined  by  the  recursion 
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formula 


(6.8)     Xjj^^  =  {I+H,,+h2+...+H^^"1)Xj^,   N  =  0,1,2,..., 


where  H-  =  I-X-rA,  and  s  is  some  integer  not  less  than  2.   If 

ill  IM 

s  =  2,  the  fonnula  becomes  S„_|_.  =  (  2I-X^-A)X,t,  vjhich  is  mentioned 
in  most  textbooks  on  num.erical  analysis  as  an  analogue  of  the 
Nevrton-Raphson  method  for  finding  the   roots  of  non-linear 
single  equations  in  scalars' ". 

The  clue  to  an  a  priori  error  analysis  for  (6.8)  lies  in 
observing  that 


'h   =   ^    -   V  =  ^  -  (^^Vl-^^-l-^'-'^^-l^^N-l^ 


I  -  (M.^_i+...^-H^.i)(i-n-..,) 


Therefore  by  back  substitution, 

s^ 

N      o 


Our  presentation  of  these  polynomial  iteration  procedures  will 
be  sli~litly  different  from  that  usually  encountered  in  the 
literature,  so  as  to  line  thorn  up  with  (6.1)  and  (6.2).   The 
usual  presentation  replaces  our  K..  =  I-X,,A  by  I  -  AX--.   A 

nuiaber  of  references  relating  to  these  methods,  as  vjell  as  to 
all  other  methods  discussed  in  t'  i  s  paper,  will  be  foand  in 
Porsythe  [6] , 


See  e.g..  Householder  [71,  PP ♦  36-57 • 
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Nox.'   a"^    ■   ^w   ^    ^^    "   ^'p'*^)^'"'"   ^   V~^'    ^^   -^o^  =   I    -   H^,    so 

A~      =    (I    -  H    )~  X   .      Prom  all   this   we   obtain: 
o  o 

N 

A"^    -   X,,   =  K   ^    (I-H    )"V.      =   h|J'^(I-H^)"^H  X^    . 
N  00  o  00  00 


This   equation   is   the    analogue      of    (6.3).      Prom  it   we   get    in 
place    of    (6.5), 


^co- 

1    H 
"r        <  -       ° 

s^-l 

s-N      =      ^    _ 

l«o 

(6.9)  11^         I    1U.__^_^    lip  11      , 

o 

and  this  clearly  represents  a  much  faster  rate  of  convergence 
than  (6 .5) • 

The  difficulty  is  that  each  iteration  of  (6.6)  requires 
sn"^  multiplications.   Moreover  in  the  special  problem  at  hand  — 
that  of  finding  only  one  element  of  A   —  the  method  does  not 
appear  to  good  advontage,  because  there  seems  to  be  no  way  to 
avoid  computing  all  the  elements  of  the  m.atrix  X„  each  time, 
and  not  just  the  k-th  column,  as  v/e  did  in  the  linear  iterative 
method.   In  other  words,  there  seems  to  be  no  direct  analogue 
of  the  vector  recursion  formula  (6.[i.)  in  the  method  given  by 
(6.8). 

The  formula  for  the  total  amount  of  work  required  to  malce 
the  right  hand  menber  of  (6.9)  less  than  r|lp||  ,  where  r  >  0  is 
preassignedj  is  as  follows: 
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(6.10) 


3     ^ 


log 

(1 

+ 

los 

r 

+ 

1( 

5R 

(1- 

-h) 

) 

log 

h 

) 

log 

s 

and  h  -   ||Hq'j  . 


where  as  usual,  ths  square  bracicets  means  "largest  integer  in". 
A  study  oi""  the  maximum  and  minimum,  of  this  expression,  considered 
as  a  function  of  s,  reveals  that  s  =  2  or  s  =  3  usually  are  the 
most  advantageous  values  of  s  to  use.   For  example,  if  r  =  10  -^ 
and  h  =  v/lO,  then  the  foi-'nula  (6.10)  becomes 


sn 


3 


1  + 


lor;  s 


J  J 


•3  -) 

If  s  =  2,  this  equals  l[;n-^ .   If  s  =  2,,    it  equals  iSn^  *      If  s  =  I4., 
it  equals  l6n-'^.   For  higher  values  of  s,  the  disadvantage 
becomes  I'lore  pronounced. 

i-Jith  r  =  10  ' ,  h  =  9/10,  s  =  2,  the  amount  of  work  required 
by  the  non-linear  iterative  method  given  by  (6J4.),  as  estimated 
from  (6.10),  is  less  than  required  by  the  linear  method  given 
by  (6.).|.),  as  estimated  from  (6.6),  for  n  ^  6.   It  is  greater 
for  n  >  7. 


»f.'  i'  ~ 
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